YAC - A Recursive Chunker for Unrestricted German Text
نویسندگان
چکیده
YAC is a fully automatic recursive chunker for unrestricted German text. It is especially designed to provide a useful basis for the extraction of linguistic as well as lexicographic information. Consequently, the grammar rules of YAC are implemented such as to make the resulting analysis meet the needs of an ensuing extraction process. The chunks provided by YAC are continuous parts of intra-clausal constituents including recursion but no PP-attachment or sentential elements. The chunks are additionally enriched with information about head lemma, morpho-syntactic features and certain lexical and structural properties.
منابع مشابه
Robust German Noun Chunking With a Probabilistic Context-Free Grammar
We present a noun chunker for German which is based on a head-lexicalised probabilistic contextfree grammar. A manually developed grammar was semi-automatically extended with robustness rules in order to allow parsing of unrestricted text. The model parameters were learned from unlabelled training data by a probabilistic context-free parser. For extracting noun chunks, the parser generates all ...
متن کاملAnnotation , storage , and retrieval of mildly recursive structures
This paper describes an unusual approach to the partial syntactic analysis of unrestricted German text. Unlike most other chunk parsers, which are specially designed and implemented for the single purpose of annotating syntactic structures, YAC is the result of a slow evolution from on-line to off-line analysis. As we formulated increasingly complex queries in the CQP query language, some of wh...
متن کاملAlignment-Guided Chunking
We introduce an adaptable monolingual chunking approach–AlignmentGuided Chunking (AGC)–which makes use of knowledge of word alignments acquired from bilingual corpora. Our approach is motivated by the observation that a sentence should be chunked differently depending the foreseen end-tasks. For example, given the different requirements of translation into (say) French and German, it is inappro...
متن کاملAn Affinity Based Greedy Approach towards Chunking for Indian Languages
A robust chunker can drastically reduce the complexity of parsing of natural language text. Chunking for Indian languages require a novel approach because of the relatively unrestricted order of words within a word group. A computational framework for chunking based on valency theory and feature structures has been described here. The paper also draws an analogy of chunk formation in free word ...
متن کاملAnalysis of German Patent Literature
We show how several components of the JET natural language analysis tool, originally developed at New York University for the analysis of English text, were adapted to German. These components, such as the part of speech tagger and the noun chunker, are explained in terms that should be understandable to a layman. On the other hand, issues that arise speci cally with regards to the German langu...
متن کامل